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INTRODUCTION 


A reconfigurable computer system must distinguish between transient 
faults and permanent faults. A transient fault usually only causes incorrect 
behavior temporarily, and consequently the operating system should not 
permanently remove the affected component via reconfiguration. Unfortunately, 
transient faults appear to occur more frequently than permanent faults. The 
available empirical data show that transients occur about 10 times more 
frequently than permanent faults. (See ref. 1.) Thus, if an operating system 
removes too many processors affected by transient faults, then the reliability 
will be seriously compromised. The development of an effective 
transient/permanent fault discrimination algorithm is a critical problem for 
fault-tolerant computer system designers. The objective of this experiment is 
threefold: 

1. To gain some fundamental information concerning error latency and the 
error propagation process in the presence of injected transient faults 

2. To obtain the necessary data to perform a reliability analysis of the 
SIFT computer system (ref. 2) including the effects of permanent and transient 
faults 

3. TO determine the effectiveness of the operating system's ability to 
discriminate between transient and permanent faults 

Only a small number of injections have been performed, therefore, 
statistically significant conclusions cannot yet be drawn. The purpose of 
this paper is to present the experimental approach and data analysis 
techniques in detail. 



SYMBOLS 


W randan variable representing the duration of transient faults 

Z* random variable representing the elapsed time from fault injection 

until last error appears. 

R* randan variable representing the elapsed time from fault injection 
until the system reconfigures 

Z randan variable representing the elapsed time from fault injection 
until last error appears given that reconfiguration does not occur 

R randan variable representing the elapsed time from fault injection 
until the system reconfigures given that reconfiguration occurs 

\ arrival rate of transient faults 

Xp arrival rate of permanent faults 

F w (w) distribution of W 

F z , ( z ) distribution of Z* 

F z (z) distribution of Z 

F R ,(r) distribution of R* 

F R ( r ) distribution of R 

F L (t) distribution of fault latency 

F p (t) distribution of permanent fault reconfiguration time 
F R | W (r,w) conditional distribution of R given W 
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F z|w (r,w) conditional distribution of Z given W 

E[ . ] expected value operator 

//( . ) mean of a distribution 

a 2 ( . ) variance of a distribution 

EXPERIMENTAL APPROACH 

In this experiment, transient faults with a particularly simple waveform 
are injected: 



Clearly, there are an infinite number of possible transient waveforms. Since 
nobody knows what the characteristics of transient waveforms are in nature, we 
are beginning with this simple waveform. The fault is held active (either 
stuck-at-1 or stuck-at-0 ) for W microseconds. 

A transient fault may or may not generate errors which are detectable by 
the operating system's voters. The following two time-graphs illustrate the 
two possible effects of a transient fault: 

Case 1: Reconfiguration does not occur 

. . . > t 

T t t t t t t t t 

s e x e 2 e 3 e 4 e 5 e 6 ... e n C 

|< Z >| 
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Case 2: Reconfiguration occurs 


... > t 

t tttttt tt 

s e 1 e 2 e 3 e 4 e 5 e 6 . . . e n r 

|< R >| 

where 

s - time fault injection initiated 

e*- the time of detection of the ith error (1 < i < n) 
r * time operating system reconfigures 

C - censoring point (i.e. point where experimental observation is 
terminated) 

2 “ e n - s 
R - r - s 

These two cases represent the outcome of two competing processes — the 
disappearance of the transient and the reconfiguration process of the 
operating system. In the first case, Z is a random variable which 
represents the duration of transient errors given that reconfiguration does 
not occur and R is a random variable which represents the reconfiguration 
time given that reconfiguration occurs. Since the operating system does not 
record error detections after the reconfiguration process, the time of 
disappearance of transient errors can only be observed when reconfiguration 
does not occur. Similarly, no information is available about the 
reconfiguration process when the errors disappear first and no reconfiguration 
takes place. Thus, although one can postulate the existence of some 
theoretical underlying competing distributions, say F R *(r) and F z ,(z), 
only the conditional distributions 

F R ( r ) - Prob[ R < r ] 

■ Prob[ R' < r | R* < • ] 

F z (z) - Probt Z < z ] 

- Prob[ Z* < z | R‘ - • ] 

can be directly observed. Furthermore, it has been shown that it is 
inpossible to identify the underlying distributions given the conditional 
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distributions and that although the actual underlying distributions may not be 
independent, the stochastic behavior can be accurately modeled as competing 
independent processes. (See ref. 3.) Therefore, the use of a semi-Markov 
model to describe this phenomena is justified. The notation R‘ - 00 
indicates the case where reconfiguration does not occur. If no errors are 
detected in an injection, Z is defined to be zero. The results of each 
injection can only be observed for a finite time. The censoring point C of 
this preliminary experiment was two minutes. Consequently, the effects of a 
fault with extremely long latency periods would be missed. 

The following model describes the response of the operating system to 
transient faults with exponential arrival rate X, : 



Since we are dealing with experimental data that is conditional, the 
nonexponential transitions will be labeled with the conditional distributions. 
In the above model, the transition from (0) to (1) is the arrival of a 
transient fault with exponential rate The transition from (1) to (0) is 

the disappearance of the transient errors. Given that this transition occurs, 
the total elapsed time of the transition is a sample from the distribution 
F z (z). The transition from (1) to (2) is the removal of the faulty processor 
via reconfiguration. The distribution of reconfiguration time is F R (r). The 
probability Pr - Prob( R* < ® ] is the probability that the transition (1) 
to (2) occurs. (The probability that the transition (1) to (0) occurs is 
1 — Pr • ) 

As mentioned earlier, each transient fault injection is performed by 
physically holding a fault active for a predetermined duration W. Since this 
can only be done for a finite set of predetermined durations, we can only 
observe the random variables R and Z in response to particular transient 
fault durations (w t , w 2 , w 3 , ... w k ). Thus, we actually observe samples from 
the conditional distributions 
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F R)W ( fc I w i ) (i.e. distribution of recovery time given that the 

transient fault is active for duration \f i ) 

F 2 |„( t | Wi ) (i.e. distribution of the time of disappearance of 

errors given that the transient fault is active for 
duration w* ) 

The probability 

P^w) - Prob [ R* < • | W - w] 

corresponds to the fraction of times the system reconfigures in the presence 
of a transient fault of duration w. 

Mathematically, 

00 

F R (r) - jf F R| W (t|w) dF w (w) 


CD 

F z (z) « X F z|w (t|w) dF w (w) 


Pr - J Pr(w) dF w (w) 

0 

where F w (w) is the distribution of transient fault durations. The 
motivation for performing the experiment in this manner is that the 
distribution of transient fault durations F w (w) is unknown. If experimental 
data were available for F w (w), then the transient fault durations could be 
sampled randomly and Z and R could be measured directly. This indirect 
method enables us to construct F z (z) and F R (r) under various assumptions 
about F w (w). 


FAULT INJECTION METHOD AND DATA CAPTURE 

It is impossible to perform transient fault injections at every pin in a 
processor for all possible transient fault durations. Thus, the fault 
injection locations were chosen randomly weighted according to the chip 
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failure rates and a small set of transient-fault durations (to be injected at 
every randomly-selected pin) were predetermined. The chip failure rates were 
determined using MIL-STD-217D. A list of the failure rates used for the chips 
in the SIFT processors are provided in Appendix A. The set of fault durations 
were not chosen to be equally far apart (i.e. equal successive differences). 
This is impractical since the fault latency (i.e. time from injection until 
first error detection) is several orders of magnitude longer for some pins 
than for others. A spacing appropriate for one pin location in the processor 
would not be appropriate for another. Consequently, the natural logarithm of 
the fault durations were chosen to be equally far apart. The following 
injection durations were used: 

1 /js, 3.16 fjs, 10 /j&, 31.62 fjs, 100 //s, 316.22 /js , 1 ms, 3.162 ms, 

10 ms, 31.62 ms, 100 ms, 316.22 ms, 1 s. 

The SIFT operating system was instrumented to obtain the time of each error 
detection on the non-injected processors. This time was obtained on each 
processor from a global clock with millisecond resolution. Since error 
detection is accomplished by voting, error detection is possible only in 
subframes where voting occurs. The SIFT schedule table including the number 
of variables voted per subframe is shown below: 


subframe 

clock tic 

task 

# variables voted 

1 

0 

CLKTA 


2 

2 

ICTl 


3 

6 

ICT2 

3 

4 

9 

ICT3 


5 

14 

MLS 

1 

6 

16 

GUI DA 

3 

7 

18 

PITCH 

6 

8 

20 

LATER 

4 

9 

22 

ERRTA 

2 

10 

24 

NULLT 


11 

26 

ICTl 


12 

30 

ICT2 

3 

13 

33 

ICT3 


14 

38 

MLS 

1 

15 

40 

GUI DA 

3 

16 

42 

PITCH 

6 

17 

44 

LATER 

4 

18 

46 

FAULT 

2 

19 

49 

NULLT 

2 

20 

51 

ICTl 
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21 

55 

ICT2 

22 

58 

ICT3 

23 

63 

MLS 

24 

65 

GUI DA 

25 

67 

PITCH 

26 

69 

LATER 

27 

71 

RECFT 


The SIFT scheduler repeatedly executes the sequence of tasks enumerated in the 
table above. Each execution of these tasks constitutes a global frame. 

The error task ERRTA counts the number of vote errors since its last 
execution. If this count exceeds the value of parameter THRESHOLD 
(arbitrarily set to 3) then it sets an error flag ERR[p] indicating that it 
has diagnosed processor p as faulty during this global frame. The fault- 
isolation task FAULT retrieves a voted version of ERR[p] . If ERR[p] is 
true for a processor p for K (arbitrarily set to 2) consecutive global 
frames then the fault-isolation task tells the reconfiguration task RECFT to 
remove processor p. In the following diagram, E represents the ERRTA 
task, F represents the fault-isolation task FAULT and R represents the 
reconfiguration task RECFT: 


— i — i — i 1 — i — i 1 — i — i •> 

EFR EFR EFR 

|< frame length (G) >| |< — * — >| 

If a fault generates errors at a rate greater than THRESHOLD/G then the 
reconfiguration time will vary between (K-1)G+# and KG+*. The value of * 
could be reduced by moving the ERRTA and FAULT task immediately before the 
RECFT task. This would reduce the mean reconfiguration time in SIFT. The fol 
lowing illustrates a typical sequence of errors detected by the SIFT 
processors when a fault is injected on processor 1: 
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GLOBAL FRAME 27 


SF 

Pi 

P3 

7 

— 

5(3) 

8 

— 

8(2) 

15 

— 

37(2) 

16 

— 

40(3) 

17 

— 

43(2) 

18 

— 

48(1) 

24 

— 

74(2) 

25 

— 

77(3) 

26 


80(2) 

SF 

Pi 

P3 


6 

109(2) 

7 

112(3) 

8 

115(2) 

16 

147(3) 

17 

150(2) 

18 

155(2) 

24 

181(2) 

26 

187(2)! 

RECONFIGURATION 

187 


P4 

P5 

P6 

5(3) 

5(3) 

5(3) 

8(2) 

8(2) 

8(2) 

37(2) 

37(2) 

37(2) 

40(3) 

40(3) 

40(3) 

43(2) 

43(2) 

43(2) 

48(1) 

48(1) 

48(1) 

74(2) 

74(2) 

74(2) 

77(3) 

77(3) 

77(3) 

80(2) 

80(2) 

80(2) 

GLOBAL FRAME 28 


P4 

P5 

P6 

109(2) 

109(2) 

109(2) 

112(3) 

112(3) 

112(3) 

115(2) 

115(2) 

115(2) 

147(3) 

147(3) 

147(3) 

150(2) 

150(2) 

150(2) 

155(2) 

155(2) 

155(2) 

181(2) 

181(2) 

181(2) 

187(2)! 

187(2)! 

187(2)! 

187 

187 

187 


The first column is the subframe; the remaining columns contain the 
error-detection times observed on each processor. For example, the column of 
numbers under the header P4 contains the tiroes that processor P4 detected 
faults on processor Pi. Hie number in parentheses following each error- 
detection time is the number of vote errors at that time. 

A summary of the results of the 297 transient fault injections is given 
in Appendix B. 


TRANSIENT FAULT CLASSIFICATION 


The errors produced by the injected transient faults fell into the 
following classes: 


Transient Null - the injected fault produced no errors 
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Transient Benign - the injected fault produced a finite sequence of 

errors, followed by correct operation of the 
processor 

Transient Persistent — the injected fault produced a non-terminating 

sequence of errors (until reconfiguration). 

The transient persistent fault's behavior is indistinguishable from a 
permanent fault while the system is operating. However, when the injected 
processor is manually restarted, it operates properly. One way that a 
transient fault can disable a processor is by crashing the microcode. 

Although the physical cause of the fault is temporary, the effect is 
permanent. Transient persistent faults should be diagnosed as permanent by 
the operating system and removed. 

Because the error generation process cannot be observed indefinitely, it 
is inpossible to exactly differentiate between the benign and persistent 
class. Furthermore, since the SIFT operating system reconfigures in the 
presence of errors (terminating the error observation process), the problem of 
distinguishing between the two classes is further complicated. In the section 
entitled "Future Experimental Directions" a new experimental approach to this 
problem is described. 

In the following summary tables, the following assumptions were made: 

(1) If the system reconfigured and errors persisted up to the 
reconfiguration point, it is assumed that the operating system 
properly diagnosed the fault as persistent. 

(2) If the system reconfigured, but errors disappeared at least 5 ms 
prior to the reconfiguration, it is assumed that the operating system 
improperly diagnosed a benign fault as persistent. 

(3) If the fault produced errors but the system did not reconfigure, then 
it is assumed that the fault was benign. 

(4) If a fault had not generated an error within the censoring time of 
the experiment (i.e., 1 minute), the fault was null. 
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CPU 


W (//sec) | 

Null 

| Benign 

1 

Persistent 

0-10 | 

.75 

| .07 

1 

.18 

10 - 100 s 

.36 

| .15 

1 

.49 

100 - 1000 | 

.31 

| .15 

1 

.54 

1000 - 10000 1 

.00 

1 -11 

1 

.89 

10000 - 100000 1 

.00 

| .21 

1 

.79 

100000 - 1000000 1 

.00 

| .05 

1 

.95 


Memory 

W (//sec) | Null | Benign | Persistent 


0 - 

10 | 

.60 

| .03 

1 .37 

10 - 

100 | 

.30 

| .10 

| .60 

100 - 

1000 | 

.17 

1 .11 

| .72 

1000 - 

10000 1 

.00 

1 .00 

1 1.00 


RELIABILITY ANALYSIS OF SIFT 


In this section a methodology for performing a reliability analysis of 
the SIFT computer system subject to transient and permanent faults will be 
presented. The methodology will be illustrated by application to a 4- 
processor SIFT system. The following assumptions govern this model: 

1. The system initially consists of four statistically-independent 
processors which fail permanently at constant failure rate and 
transiently at constant rate \ . 

2. Each processor executes the exact same program on exactly the same 
inputs so that all non-faulty processors produce exactly the same 
output. The system "votes" the outputs prior to external use. Thus, 
so long as a majority of the processors are non-faulty, any erroneous 
values are "masked". 

3. The system removes the faulty processors via reconfiguration. The 
first reconfiguration reduces the system to a triplex configuration. 
A second reconfiguration reduces the system to a simplex. 
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4. The distribution of reconfiguration time F„(r) is unknown and must 
be determined experimentally. 

5. The distribution of transient error duration F z (z) and transient 
fault duration F w (w) is unknown. No experimental data is available 
(nor does this experiment provide any) for these distributions. 


The computation of the probability of system failure based on this model 
will be performed using the Semi -Markov Unreliability Range Evaluator (SURE) 
program. (See ref. 4.) A key advantage of the SURE program lies in its use 
of means and variances of the unknown distributions. It is not necessary to 
assume some family of underlying distribution and perform distribution-fitting 
procedures. The SURE input file describing this model is: 


LAMBDA - 2E-4; 

(* 

GAMMA =» 10*LAMBDA; 

(* 

P R « 

(* 

MU R - 

(* 

SIGMA R * 

(* 

MU Z - 

(* 

SIGMA_Z - 

(* 

MU P - 

(* 

SIGMAP = 

(* 

1.2 - 4*GAMMA; 

2.3 - 3*GAMMA + 3*LAMBDA; 

1.4 - 4*LAMBDA; 

4.5 - 3*GAMMA; 

2.5 = 3*LAMBDA; 


4.6 - 3*LAMBDA + 3*GAMMA; 

2.7 * <MU R, SIGMA R, P R>; 
2,1 - <MU Z, SIGMA Z, 1-P R>; 

4.7 - <MU S, SIGMA S>; 

7.8 - 3*GAMMA; 


8.9 - 2*GAMMA + 2*LAMBDA; 

7.10 - 3* LAMBDA; 

10,12 = 2*LAMBDA + 2*GAMMA; 
10,11 = 2*GAMMA; 


8.12 - 2*LAMBDA; 

8.13 « <MU R, SIC31A R, P R> ; 

10.13 - <MU S, SI(34A S> ; 

8,7 - <MU Z, SIGMA Z, 1-P R>; 

13.14 - GAMMA + LAMBDA; 



permanent fault arrival rate *) 

transient fault arrival rate \ *) 

probability of reconfiguration p R *) 

mean reconfiguration time /j( F R ) *) 

stan. dev. of reconf. time a(F R ) *) 
mean last error time v(F z ) *) 

stan. dev. of last error <j(F 2 ) *) 

mean permanent reconf. time /j( F p ) *) 

stem. dev. perm, reconf. time a(F p ) *) 
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Hie graphical display of this model in figure 1 was generated by the SURE 
program. Hie complete input file to the SURE program including the 
calculation of the means and variances is given in Appendix C. 



Figure 1.- Reliability model of 4-processor SIFT. 
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Hie following non-parametric statistics must be estimated from the 
experimental data: 

CO 

Pft - f Pr(w) dF w (w) 

0 




I E[R|W 
0 


w] dF w (w) 


o 2 (F R ) -E(R 2 ) - (E[R] ) 2 - J E[R 2 |W-w] dF w (w) - [/u(R)] 2 

0 


/j(F z ) - J E[Z|W - w] dF w (w) 
0 


a 2 (F z ) - E(Z 2 ) - ( E[ Z J ) 2 - J E[Z 2 |W - w] dF w (w) - [//(Z) ] 2 

0 


H( F p ) - E{R| W - -] 


a(F p ) - E[R 2 |W - «■>] 


Since we only have measurements of E[R|V^w i ], E[Z|W-w i ), E[R 2 ] and 
E[Z 2 |W-w A ) for a few values of w t we are forced to approximate the integral 
with a numerical method: 


Pr - / Pr(w) dF w (w) 

0 

k+1 w L 

- E J Pr(w) dF w (w) where w k+1 - • 

i-l w 1 _ 1 

k A A 

“ £ Pr< w i) tF w ( Wi ) - F w (w i _ 1 )] - Pr 

i-1 
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Similarly 


* k 

fj( F R ) = E E[R|W*wJ IF*^) -F^w^)] 
i-1 

* k 

o 2 ( F R ) a Z E[R 2 |W-Wi] [F w (w 4 ) - F^w^)) -// 2 (F R ) 

i-1 


Pz " 1 " Ph 


k « 

//( F z ) a E EtZlW-wJ [F W (W 1 ) - F^W^)] 

i-1 


* k 

a 2 ( F z ) a E E[Z 2 iV^Wi ] [F M (w i ) - F w (w i _ 1 )] -// 2 (F Z ) 

i-1 


The first and second moments of the distribution of reconfiguration time 
in the presence of permanent faults F p can be estimated using the following 
simple unbiased estimators: 


- h 

//(F p ) - l r A /r) 
i-1 


o 2 (F p ) - E( r* - U) 2 A h-1) 
i-1 


where ( r x , r 2 , ... r n ) is a random sample obtained via permanent fault 
injection. A histogram of this sample is given in figure 2. The mean 
reconfiguration time is 272.6 ms and the standard deviation is 121.5 ms. 
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TIME (asec) 


Figure 2.- Reconfiguration time histogram (permanent faults). 

A XV St A A 

The following values of p„ , E[R|W-w i ], E[Z|V^«w i ], E[R 2 |W-Wj ] and E[Z 2 Iw-Wj ] 
were observed: 


w i 

Pr< w > 

Efzlw-^ ] 

EtRlw^Wi ] 

E[Z 2 |W-w t ] 

E[R 2 |W-w i ] 

1 fJS 

.17 

0.1 

239.0 

0.16 

57282.60 

3.16 fjs 

.30 

66.3 

350.9 

92402.34 

181682.00 

10 /js 

.37 

0.0 

239.4 

0.00 

58599.27 

31.62 fjs 

.53 

47.3 

255.9 

15875.00 

68679.12 

100 //s 

.69 

0.0 

247.1 

0.00 

62566.75 

316.22 fjs 

.54 

2.5 

238.3 

40.00 

57448.20 

1 ms 

.86 

0.0 

284.3 

0.00 

108444.48 

3.162 ms 

1.00 

— 

256.9 

— 

66597.96 

10 ms 

1.00 

— 

249.1 

— 

62921.78 

31.62 ms 

.95 

282.0 

248.4 

79524.00 

62708.11 

100 ms 

.90 

99.0 

255.7 

9801.00 

66016.66 

316.22 ms 

1.00 

— 

251.5 

— 

64170.50 

1 s 

1.00 

— 

236.7 

— 

56681.30 
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Since the distribution of transient fault durations is unknown and no 
experimental data is available, sensitivity analysis will be performed under 
the assumption of three different families of distributions. If experimental 
data were available for F w (w), then the transient fault durations could be 
sampled randomly and Z and R could be measured directly. In that case, 
this indirect calculation would be unnecessary. Hie exponential, uniform and 
Weibull distributions will be analyzed. 

Analysis Assuming Exponential Transient Duration 

In this section the reliability analysis is performed under the 
assumption that the distribution of the duration of transient faults is 
exponentially distributed. Thus, 

F w (w) - 1 - e _ * w 

for some +. The probability of system failure as a function of //( F w ) - 1/4 
is given in figure 3. 
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Analysis Assuming uniform Transient Duration 


In this section the reliability analysis is performed under the 
assumption that the distribution of the duration of transient faults is 
uniformly distributed. Thus, 


F w (w) - 


w/3 0 < w < 3 

1 3 < w 


for some 3. The probability of system failure as a function of //( F w ) 
is given in figure 4. 


Figure 4.- Prob. of failure vs. /^(F) for uniform F 


1 ! 




Analysis Assuming Weibull Transient Duration 


In this section the reliability analysis is performed under the 
assumption that the distribution of the duration of transient faults is 
weibull: 

F w (w) - 1 - e-+ w “ 




Sensitivity to F w (w) 


A comparison of figures 3, 4 and 5 reveals that the probability of system 
failure is only moderately sensitive to the different shapes of the 
distributions. However, the unreliability varies over two orders of magnitude 
depending upon the mean of //(F w ). Once again the reader is cautioned that 
this observation is based on a very small sample and the assumption that F w 
comes from one of these three families of distributions. 

EFFECTIVENESS OF SIFT'S TRANSIENT/PERMANENT FAULT DISCRIMINATOR 

In this section some preliminary observations are made about the 
effectiveness of the SIFT transient/permanent fault discrimination algorithm. 
There are two ways the operating system can incorrectly diagnose a fault: 

(1) A permanent fault generates intermittent errors that are 
indistinguishable from the errors produced by two or more transients 
faults and thus is not reconfigured. 

(2) The time that the operating system waits to see if a fault is 
transient is not long enough to recognize a particularly long 
transient and consequently reconfigures before the fault disappears. 

The first case was not observed during the experiment (i.e., all 
permanent faults were successfully reconfigured) . There were many cases where 
the transient injection resulted in the injected processor being reconfigured 
out of the system. However, whether this was a correct decision depends on 
whether the fault was transient benign or transient persistent. If the fault 
is transient persistent, the decision was correct. If the fault was transient 
benign, the decision was incorrect. Because the error generation process was 
not observed after a reconfiguration, the type of fault could not be 
ascertained with 100% confidence from the experimental data. Thus, it was 
inpossible to determine if at some time subsequent to the reconfiguration the 
errors of a transient benign fault would disappear. Also, if the errors had 
disappeared shortly before the reconfiguration (suggesting that the fault was 
transient benign), there was no way to determine if this was merely a 
temporary lapse in the error sequence of a transient persistent fault. In the 
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section entitled "FUTURE EXPERIMENTAL DIRECTIONS" a modification to the 
experimental approach is presented which facilitates the process of 
distinguishing transient benign faults from transient persistent faults. In 
the rest of the section a simple analysis of the data is given which gives 
some indication of the operating system's ability to distinguish these types 
of faults. 

Let S be the time between the last error detection and the 
reconfiguration. If a fault generates errors up to the time of 
reconfiguration (i.e. S = 0), then the diagnosis as permanent is probably 
correct. However, if S is large, then most likely the fault was improperly 
diagnosed. The distribution of S is given in figure 6. 



0. 12.5 25. 37.5 50. 62.5 75. 87.5 100. 112.5 125. 

TIME (msec) 


Figure 6.- Histogram of S ■ R - Z*. 
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There were 190 fault injections which resulted in a reconfiguration. Of 
these, 167 had a value of S less than one clock tick (1.6 ms). In the 
remaining injections there were only 2 injections with a value of S less 
than 31 ms as revealed in the following tables 


S 


# of occurrences 


0 - 1 ms | 167 

2 - 5 ms | 0 

6 ms | 2 

7 - 31 ms | 0 

> 31 ms I 21 


Exactly where the division should be made between the transient benign and 
transient persistent class is not obvious. Since there were very few 
injections with values of S between 2 ms and 31 ms, any choice in this range 
would yield essentially the same results. In the following tables, the 
division is made at 5 ms. Therefore all faults with an value of S less than 
5 is assumed to be transient persistent and those with a value of S greater 
than 5 are assumed to be transient benign. Using this classification scheme, 
the percentage of improperly reconfigured faults were: 


# transient benign faults reconfigured 

% error - x 100% 

# reconfigurations 

23 

- x 100% - 12.1% 

190 


There were 9 injections which produced errors but did not lead to a 
reconfiguration. Assuming that these faults were correctly diagnosed as 
transient benign, the percentage of transient benign faults which were 
improperly diagnosed were: 
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# transient benign faults reconfigured 

% error - x 100% 

# transient benign faults 


23 

- x 100% - 71.9% 

32 

Since most of the faults injected were transient persistent the 
percentage of improperly reconfigured faults was small (12.1%). However, of 
the faults that should not have been reconfigured (transient benign), 71.9% 
were improperly reconfigured. 

DISTRIBUTION OF FAULT LATENCY 

Hie data obtained in this experiment is sufficient to determine fault 
latency in SIFT using the methodology developed by the University of Michigan. 
(See ref. 5.) Consider the following graph of the fault propagation process: 

fault arrival error generated error detected 

^ 4 'i' 

1 1 1 > t 

|< fault latency > |< error latency >| 

Let L be a random variable representing the fault latency with 
distribution function F L (1) and let n ± (w) represent the total number of 
transient fault injections at pin i with duration W. If (w) is a 
random variable representing the number of injections which result in at least 
one error detection, then 

E{ D a (w)/n t (w) ] < F l (w) 

under the assumption that errors generated by the injected fault are 
propagated and detected before the censoring point of the experiment: 

E[ D^wJ/n^w) ] = F l (w) 
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If dj (w) detections are observed in response to n t (w) injections, the 
following estimator of F L (w) is unbiased: 


F l (w) - d^wl/n^w) = F L (w). 

The following measurements were obtained: 


w i 

d t (w t ) 

n A (w. ) 

Fl(Wi) 

1 fJS 

6 

30 

.20 

3.16 /js 

10 

30 

.33 

10 /js 

11 

30 

.37 

31.62 fjs 

19 

30 

.63 

100 /us 

20 

29 

.69 

316.22 fjs 

17 

28 

.61 

1 ms 

25 

29 

.86 

3.162 ms 

24 

24 

1.00 

10 ms 

18 

18 

1.00 

31.62 ms 

19 

19 

1.00 

100 ms 

10 

10 

1.00 

316.22 ms 

10 

10 

1.00 

1 s 

10 

10 

1.00 


Although F l (w) must be monotonic increasing, the estimates F L (w t ) may 
not be. Clearly, a statistical method of estimating the F L (w t ) under a 
constraint of monotonicity is needed. 

Theoretically: 


fj( F l ) - J 1 - F L (t) dt 
0 


a 2 (F h ) - 2 J t U - F t (t)J dt - [//(F L )] 2 
0 


The following are approximations to the mean and variance: 


~ k 

F l ) « I [1 - F t (w i )1 (Wi-Wi.i) 
i«l 

ff 2 ( F L ) - E 2w. [1 - F L ( Wi )] (Wi-Wi.J - [i(F L )J 2 
i-1 


25 



The following values of // and a were obtained: 


fi m 0.216 ms 
a - 0.451 ms 


FUTURE EXPERIMENTAL DIRECTIONS 


Improvements in Measuring the Effectiveness of the 
Operating System's Transient/Permanent Fault Discriminator 

A key factor in evaluating the effectiveness of the operating system's 
ability to discriminate between transient and permanent faults is determining 
whether an injected transient fault is transient benign or transient 
persistent. In this section a simple modification to the experimental 
approach is described which enables a more accurate determination of the type 
of the fault. 

The recommended change in the experimental method is: 

(1) disable the reconfiguration process of the SIFT operating system so 
that reconfiguration does not occur. 

(2) instrument the operating system to record the time that 
reconfiguration would normally have occurred. 

In this way the error propogation process can be observed for a greater 
amount of time. The possible results of an injection are now: 

case 1: no reconfiguration 


• • • 

t tttttt t 

S 6^ ©2 ©4 • • • ©j 

|< Z * : >| 
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case 2s reconfiguration occurs 


t T T t T tft T t 

s e x e 2 e 3 e^ • • • e^ r 1 • • • • C 

l< z* >1 

|< R >| 

where 

s - time injection begins 

e A - the time of detection of the ith error (1 < i < n) 
r - time operating system reconfigures 
C - censoring point of the experiment 
Z* - e n - s 
R - r - s 

By observing the error generation process after the reconfiguration time 
r until the censoring point C, we obtain the unconditional Z* directly. 
Therefore, the incorrect diagnosis of a transient fault as a permanent can be 
more accurately discerned. If the errors disappear at some point after the 
reconfiguration point, then the diagnosis that the fault was permanent was 
wrong. Similarly, the classification of the transient faults into transient- 
benign and transient-persistent is simplified. 

Measuring F z (z) and the Consequent Refined Analysis Method 

in this section, a method of measuring the duration of natural (i.e. non- 
injected) transient errors F z (z) is introduced. The implications of such an 
experiment are far-reaching. First, the need to assume a simple stuck-at-1 
pin-level fault has been removed. Second, the need for assuming some 
underlying distribution for the duration of the faults F w (w) is eliminated. 
The response of the operating system to transient faults of varying durations 
created by physical pin-level injections no longer has to be measured. The 
observed response of the operating system to the natural transient faults can 
be directly entered into the reliability model. The effectiveness of any 
modifications to the transient/permanent fault discrimination algorithm can be 
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measured by artificially introducing error detections. These artificial error 
detections can be introduced by changing memory locations in the SIFT 
processors while they are executing. The patterns of error detections to be 
introduced artifically can be inferred from the sequences of error detections 
observed from natural transient faults. 

The following approach is suggested for measurement of F z (z). Disengage 
SiFT's reconfiguration algorithm and let it run continuously for many years. 
Instrument the operating system with the same data gathering code as described 
in the section "EXPERIMENTAL APPROACH" and collect as many natural transient 
faults as possible. (Note, the distribution F Y can be determined from the 
distributions F L and F z using the above relationship between the random 
variables). If the transient fault arrival rate is 5xl0“ 3 /hour, about 40 
transient faults should be observed in a year. 
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CONCLUDING REMARKS 

A detailed description of the preliminary transient fault experiment 
along with the results from 297 transient injections are given. Although not 
enough data was obtained to draw statistically significant conclusions, the 
foundation has been laid for a large-scale transient fault experiment. 

Several changes in the experimental procedure are recommended for the large- 
scale experiment in order to increase the usefulness of the experiment. The 
sensitivity of the probability of system failure to the mean duration of the 
transient faults reveals the pressing need for credible measurements of 
transient fault behavior. 
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APPENDIX A 


SIFT Chip Failure Rates ( x 10 -6 hr -1 ) 


Chip Type 

# pins 

Rate/Chip 

Chip Type 

# pins 

Rate/Chip 

54S30 

14 

0.1654 

54S20 

14 

0.1913 

5440 

14 

0.2132 

54S244 

20 

0.3099 

54LS30 

12 

0.1870 

54LS20 

12 

0.1913 

54LS21 

12 

0.1913 

4001B 

14 

0.2387 

4093B 

14 

0.2388 

5410 

14 

0.2397 

54LS11 

14 

0.2404 

54LS10 

14 

0.2404 

54LS27 

14 

0.2404 

54S10 

14 

0.2406 

54126 

14 

0.2424 

5438 

14 

0.2424 

54125 

14 

0.2424 

54LS86 

14 

0.2432 

54LS02 

14 

0.2432 

54LS09 

14 

0.2432 

54LS08 

14 

0.2432 

54LS33 

14 

0.2432 

54LS32 

14 

0.2432 

54LS125 

14 

0.2432 

54LS00 

14 

0.2432 

54LS126 

14 

0.2432 

54S37 

14 

0.2437 

54S86 

14 

0.2437 

54S08 

14 

0.2437 

54S32 

14 

0.2437 

54S02 

14 

0.2437 

54S00 

14 

0.2437 

54LS122 

12 

0.2100 

54LS53 

14 

0.2456 

54155 

16 

0.2815 

5404 

14 

0.2472 

54LS51 

14 

0.2479 

54LS04 

14 

0.2479 

54S04 

14 

0.2495 

54S51 

14 

0.2495 

70C96 

16 

0.2899 

5437 

16 

0.2916 

5474 

14 

0.2580 

54LS74 

14 

0.2584 

54LS74A 

14 

0.2584 

7837 

16 

0.2964 

54S74 

14 

0.2615 

54C175 

16 

0.3001 

54C174 

16 

0.3010 

54LS93 

10 

0.1883 

54LS279 

16 

0.3012 

54LS367 

16 

0.3012 

54LS368 

16 

0.3012 

54LS113 

14 

0.2643 

54LS92 

10 

0.1895 

7835 

16 

0.3047 

DS1651 

16 

0.3070 

54S113 

14 

0.2702 

7603.2 

16 

0.3098 

54S288 

16 

0.3098 

5331 

16 

0.3098 

HM7603 

16 

0.3098 

HD6440A.2 

18 

0.3484 

54LS288 

16 

0.3119 

54LS158 

16 

0.3121 

54LS155 

16 

0.3121 

54LS157 

16 

0.3121 

54LS257 

16 

0.3121 

54156 

16 

0.3123 

54LS138 

16 

0.3135 

54LS153 

16 

0.3135 

54LS253 

16 

0.3135 

54LS112 

16 

0.3135 

54LS109 

16 

0.3135 

54LS352 

16 

0.3135 

54LS151 

16 

0.3149 

54LS251 

16 

0.3149 

54LS139 

16 

0.3163 

AM2902 

16 

0.3176 

2902 

16 

0.3176 

54182 

16 

0.3189 

54LS123 

16 

0.3189 

54S112 

16 

0.3194 

54S153 

16 

0.3194 

54S253 

16 

0.3194 

54S151 

16 

0.3217 

54LS175 

16 

0.3240 

LM119D 

14 

0.2859 

54175 

16 

0.3271 

54LS148 

16 

0.3301 

54LS148 

16 

0.3301 

54LS164 

14 

0.2891 

54LS241 

20 

0.4129 

54LS240 

20 

0.4129 
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54LS244 

20 

0.4129 

29LS18 

16 

0.3313 

54S240 

20 

0.4179 

54LS174 

16 

0.3383 

54S175 

16 

0.3385 

DM7136 

16 

0.3392 

5485 

16 

0.3392 

7136 

16 

0.3392 

54LS245 

20 

0.4242 

25LS2518 

16 

0.3405 

7611.2 

16 

0.3456 

54LS393 

14 

0.3049 

54LS194 

16 

0.3507 

5496 

16 

0.3542 

54LS298 

16 

0.3552 

7131 

16 

0.3583 

54LS161A 

16 

0.3620 

54LS161 

16 

0.3620 

25LS2537 

20 

0.4530 

54LS163 

16 

0.3632 

9410 

18 

0.4093 

54LS191 

16 

0.3643 

54LS259 

16 

0.3643 

54LS390 

16 

0.3654 

54LS169 

16 

0.3654 

75107B 

14 

0.3209 

MHQ3467 

14 

0.3209 

54LS290 

16 

0.3677 

54LS165 

16 

0.3677 

54LS273 

20 

0.4621 

54LS377 

20 

0.4632 

54LS374 

20 

0.4711 

25LS2536 

20 

0.4803 

54273 

20 

0.4859 

25LS377 

20 

0.4873 

SE555F 

8 

0.1953 

55471J 

8 

0.1953 

54S471 

20 

0.4918 

AM25LS2569 

20 

0.4969 

54LS381 

20 

0.4981 

75109A 

14 

0.3516 

25LS2517 

20 

0.5030 

CA3039 

12 

0.3059 

54LS299 

20 

0.5169 

75109 

14 

0.3809 

2911 

20 

0.5504 

9407 

24 

0.6663 

7641.2 

23 

0.6538 

54S472 

20 

0.5771 

HM7643 

18 

0.5216 

AM2940DM 

28 

0.8395 

HM6514.2 

18 

0.6013 

2914 

40 

1.3708 

AM2812 

28 

1.0408 

2901A 

40 

1.4945 

93L422 

22 

0.9420 

2901A 

40 

1.8233 

AM2901A 

40 

1.8233 

LM193 

8 

0.3770 

2716 

24 

1.4030 

MK4114.3 

18 

1.1236 

LM741 

7 

0.4906 

LF156H 

8 

0.7382 

QT6T9 

4 

0.4906 

LN120H.5 

3 

0.4348 
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APPENDIX B 


Summary of Fault Injections 

This section includes a summary of the 279 transient fault injections in 
tabular form. The information included under each heading is: 


INJ - 
RECTO - 


TWIDTH- 

FIRSTE- 


LASTE - 


injection number 

the time in milliseconds from the injection until the system 
reconfigured. 

the duration of the injection in milliseconds, 
the time in milliseconds from the injection until the first error 
detection on another processor. The notation ... indicates that 
no error was detected. 

the time in milliseconds from the injection until the last error 
detection on another processor. 

— > indicates last error same as reconfiguration time. 

. . . indicates no errors detected 

-x+R indicates the last error was x milliseconds before 
reconfiguration 


TYP - the type of fault: SAl - + 5 volts, SAO - -5 volts. 

LOCATION - the processor, board, chip and pin where fault was injected. 


INJ 

RECTO 

TWIDTH(ms) 

FIRSTE 

LASTE 

TYP 


LOCATION 


1 

244 

0.001 

20 

„> 

SAl 

PI 

CPU 

U35 

2 

2 

• • • 

0.001 


• • • 

SAl 

PI 

CPU 

U35 

2 

3 

• • • 

0.001 


• • • 

SAl 

PI 

CPU 

U35 

2 

4 

• • • 

0.001 


• • • 

SAl 

PI 

CPU 

U35 

2 

5 

236 

0.001 

*12 


SAl 

PI 

CPU 

U35 

2 

6 

• • • 

0.001 


• • • 

SAO 

PI 

CPU 

U35 

2 

7 

• • • 

0.001 


• • # 

SAO 

PI 

CPU 

U35 

2 

8 

• • • 

0.001 


• • • 

SAO 

PI 

CPU 

U35 

2 

9 

216 

0.001 

’*2 

«*> 

SAO 

PI 

CPU 

U35 

2 

10 

• • • 

0.001 


• • • 

SAO 

Pi 

CPU 

U35 

2 

11 

• • • 

0.003 



SAl 

PI 

CPU 

U35 

2 

12 

284 

0.003 

’23 

”> 

SAl 

PI 

CPU 

U35 

2 

13 

# • • 

0.003 


• • • 

SAl 

PI 

CPU 

U35 

2 

14 

# • • 

0.003 


• • • 

SAl 

PI 

CPU 

U35 

2 

15 

574 

0.003 

*35 

-40+R 

SAl 

PI 

CPU 

U35 

2 

16 

• • • 

0.003 

• • * 

• • • 

SAO 

PI 

CPU 

U35 

2 
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17 

• • • 

0.003 

• • • 

• • • 

SAO 

Pi 

CPU 

U35 

2 

18 

• • • 

0.003 

• • • 

• • • 

sao 

Pi 

CPU 

U35 

2 

19 

216 

0.003 

1 

”> 

SAO 

PI 

CPU 

U35 

2 

20 

• • • 

0.003 

• • • 

* • • 

SAO 

Pi 

CPU 

U35 

2 

21 

282 

0.010 

21 

— 1+R 

SAl 

PI 

CPU 

U35 

2 

22 

259 

0.010 

4 

-108+R 

SAl 

PI 

CPU 

U35 

2 

23 

193 

0.010 

5 

— > 

SAl 

PI 

CPU 

U35 

2 

24 

• • • 

0.010 

• • • 

• • • 

SAl 

PI 

CPU 

U35 

2 

25 

214 

0.010 

25 

“> 

SAl 

PI 

CPU 

U35 

2 

26 


0.010 

• • • 


SAO 

PI 

CPU 

U35 

2 

27 


0.010 

• • • 


SAO 

PI 

CPU 

U35 

2 

28 


0.010 

m • • 


SAO 

Pi 

CPU 

U35 

2 

29 


0.010 

• mm 


SAO 

PI 

CPU 

U35 

2 

30 


0.010 

• • • 


SAO 

Pi 

CPU 

U35 

2 

31 


0.032 

129 

279 

SAl 

Pi 

CPU 

U35 

2 

32 

253 

0.032 

2 


SAl 

Pi 

CPU 

U35 

2 

33 

• • • 

0.032 

• • • 

• • • 

SAl 

PI 

CPU 

U35 

2 

34 

255 

0.032 

3 

**> 

SAl 

Pi 

CPU 

U35 

2 

35 

239 

0.032 

15 

-73+R 

SAl 

Pi 

CPU 

U35 

2 

36 

250 

0.032 

26 

— > 

SAO 

Pi 

CPU 

U35 

2 

37 

• • • 

0.032 

• • • 

• ♦ • 

SAO 

PI 

CPU 

U35 

2 

38 

• • • 

0.032 

• • • 

• • • 

SAO 

Pi 

CPU 

U35 

2 

39 

274 

0.032 

13 

— l+R 

SAO 

Pi 

CPU 

U35 

2 

40 

190 

0.032 

1 

-1+R 

SAO 

Pi 

CPU 

U35 

2 

41 

241 

0.100 

17 

-1+R 

SAl 

Pi 

CPU 

U35 

2 

42 

221 

0.100 

104 

=-=»> 

SAl 

Pi 

CPU 

U35 

2 

44 

284 

0.100 

23 

-6+R 

SAl 

Pi 

CPU 

U35 

2 

45 

237 

0.100 

13 

-1+R 

SAl 

PI 

CPU 

U35 

2 

46 

• • • 

0.100 

• • • 

• # # 

SAO 

Pi 

CPU 

U35 

2 

47 

287 

0.100 

26 

-1+R 

SAO 

Pi 

CPU 

U35 

2 

48 

246 

0.100 

21 

-1+R 

SAO 

PI 

CPU 

U35 

2 

49 

• ft • 

0.100 

• • • 

# • • 

SAO 

PI 

CPU 

U35 

2 

50 

188 

0.100 

3 

— > 

SAO 

PI 

CPU 

U35 

2 

51 

253 

0.316 

1 

-1+R 

SAl 

Pi 

CPU 

U35 

2 

52 

• # • 

0.316 

8 

18 

SAl 

PI 

CPU 

U35 

2 

53 

• • • 

0.316 

• « • 

• ♦ * 

SAl 

Pi 

CPU 

U35 

2 

54 

261 

0.316 

3 

-33+R 

SAl 

Pi 

CPU 

U35 

2 

55 

221 

0.316 

3 

— > 

SAl 

Pi 

CPU 

U35 

2 

56 

248 

0.316 

24 

-107+R 

SAO 

Pi 

CPU 

U35 

2 

57 

• • • 

0.316 

• • • 

• • • 

SAO 

PI 

CPU 

U35 

2 

58 

231 

0.316 

114 

— > 

SAO 

Pi 

CPU 

U35 

2 

60 

184 

0.316 

2 

-72+R 

SAO 

PI 

CPU 

U35 

2 

61 

215 

1.000 

26 

-1+R 

SAl 

PI 

CPU 

U35 

2 

62 

216 

1.000 

2 

—> 

SAl 

Pi 

CPU 

U35 

2 

63 

• • • 

1.000 

• • • 

• • • 

SAl 

Pi 

CPU 

U35 

2 

64 

215 

1.000 

26 

-1+R 

SAl 

PI 

CPU 

U35 

2 

65 

227 

1.000 

3 

“> 

SAl 

PI 

CPU 

U35 

2 

71 

264 

3.160 

3 

n> 

SAl 

PI 

CPU 

U35 

2 

72 

286 

3.160 

25 


SAl 

PI 

CPU 

U35 

2 

73 

252 

3.160 

1 

-32+R 

SAl 

PI 

CPU 

U35 

2 

74 

244 

3.160 

20 

-1+R 

SAl 

PI 

CPU 

U35 

2 

75 

278 

3.160 

17 

»> 

SAl 

Pi 

CPU 

U35 

2 

81 

280 

1.000 

19 

*«> 

SAO 

Pi 

CPU 

U35 

2 

82 

• • • 

1.000 

• • • 

• • • 

SAO 

Pi 

CPU 

U35 

2 

83 

262 

1.000 

4 

n> 

SAO 

Pi 

CPU 

U35 

2 
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84 

263 

1.000 

2 

»> 

SAO 

Pi 

CPU 

U35 

2 

85 

257 

1.000 

2 

-47+R 

sao 

Pi 

CPU 

U35 

2 

86 

262 

3.162 

4 

-1+R 

SAO 

Pi 

CPU 

U35 

2 

87 

253 

3.162 

2 

— > 

SAO 

Pi 

CPU 

U35 

2 

88 

281 

3.162 

20 


SAO 

Pi 

CPU 

U35 

2 

89 

280 

3.162 

19 

-1+R 

SAO 

Pi 

CPU 

U35 

2 

90 

220 

3.162 

3 

— > 

SAO 

Pi 

CPU 

U35 

2 

91 

271 

10.000 

10 

«*> 

SAl 

Pi 

CPU 

U35 

2 

92 

219 

10.000 

2 

**> 

SAl 

PI 

CPU 

U35 

2 

93 

285 

10.000 

24 

**> 

SAl 

Pi 

CPU 

U35 

2 

94 

216 

10.000 

1 

**> 

SAl 

Pi 

CPU 

U35 

2 

95 

247 

10.000 

23 

-1+R 

SAl 

Pi 

CPU 

U35 

2 

96 

244 

10.000 

20 

-1+R 

SAO 

Pi 

CPU 

U35 

2 

97 

278 

10.000 

17 

— > 

SAO 

PI 

CPU 

U35 

2 

98 

197 

10.000 

8 

n> 

SAO 

PI 

CPU 

U35 

2 

100 

279 

10.000 

19 

*»> 

SAO 

Pi 

CPU 

U35 

2 

101 

261 

31.620 

4 

— > 

SAl 

Pi 

CPU 

U35 

2 

102 

224 

31.620 

107 


SAl 

PI 

CPU 

U35 

2 

103 

215 

31.620 

1 


SAl 

Pi 

CPU 

U35 

2 

104 

278 

31.620 

17 

—> 

SAl 

Pi 

CPU 

U35 
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APPENDIX C 


SURE Model 


LAMBDA - IE-4; 

K - 10.0; 

GAMMA - K* LAMBDA; 

MU W - IE-9 TO* IE-1 BY 10; 


BETA - 2*MU_W; 
WO - 0.0; 

Wl - IE-6; 

W4 - 31. 

62E-6 ; 

W5 - 100.0E-6; 

W8 - 3.162E-3; 

W9 - 10E-3; 

W12 - 316.22E-3; 

W13 - 1000. OE-3; 

ERWO - 239.0; 

ERWl - 239.0; 

ERW4 - 255.875; 

ERW5 - 247.150; 

ERW8 - 256.916; 

ERW9 - 249.111; 

ERW12 - 

251.500; 

ERW13 - 236.700; 

ER2W0 - 

57282.6; 

ER2W1 - 57282.602; 

ER2W4 - 

68679.1; 

ER2W5 - 62566.7; 

ER2W8 - 

69975.8; 

ER2W9 - 62921.8; 

ER2W12 - 

64170.5; 

ER2W13 - 56681.3; 

EZWO - 0 

1.0; 

EZWl - 0.1; 

EZW4 - 47.286; 

EZW5 - 0.0; 

EZW8 - G 

1.0; 

EZW9 - 0.0; 

EZW12 - 

O 

• 

o 

EZWl 3 - 0.0; 

EZ2W0 - 

.160; 

EZ2W1 - .160; 

EZ2W4 - 

15875.0; 

EZ2W5 - 0.0; 

EZ2W8 - 

0.0; 

EZ2W9 - 0.0; 

EZ2W12 - 0.0; 
PRWO - 0.0; 

PRWl - .17; 

PRW4- .53; 

PRW5 - .69; 

PRW8 - ] 

..00; 

PRW9 - 1.00; 

PRW12 - 

1.00; 

PRWl 3 - 1.00; 

FWO - 1 

IF W0 

< BETA THEN FWO » 

FW1 - 1 

IF Wl 

< BETA THEN FWl - 

FW2 - 1 

IF W2 

< BETA THEN FW2 - 

FW3 - 1 

IF W3 

< BETA THEN FW3 - 

FW4 - 1 

IF W4 

< BETA THEN FW4 - 

FW5 « 1 

IF W5 

< BETA THEN FW5 - 

FW6 - 1 

IF W6 

< BETA THEN FW6 - 

FW7 - 1 

IF W7 

< BETA THEN FW7 - 

FW8 - 1 

IF W8 

< BETA THEN FW8 - 

FW9 - 1 

IF W9 

< BETA THEN FW9 - 

FWlO - 1; IF WlO < BETA THEN FW10 


W2 - 3.16E-6; 

W6 - 316.22E-6; 
WlO - 31.62E-3; 

W3 - IE-5; 

W7 - 1.0E-3; 
Wll - 100E-3; 

ERW2 - 350.889; 
ERW6 - 238.333; 
ERW10 - 248.444; 

ERW3 - 239.445; 
ERW7 - 284.320; 
EHWll - 255.778; 

ER2W2 - 181682.0; 
ER2W6 - 57448.1; 
ER2W10 - 62708.1; 

ER2W3 - 58599.2; 
ER2W7 - 108444.4 
ER2W11 - 66016.7 

EZW2 - 66.333; 
EZW6 - 2.462; 
EZW10 - 282.000; 

EZW3 - 0.0; 

EZW7 - 0.0; 
EZWll - 99.000; 

EZ2W2 - 92402.3; 
EZ2W6 - 40.00; 
EZ2W10 - 79524.0; 
EZ2W13 - 0.0; 

EZ2W3 - 0.0; 
EZ2W7 - 0.0; 
EZ2W11 - 9801.0; 

PRW2 - .30; 
PRW6- .54; 
PRW10 - .95; 

PRW3- .37; 
PRW7 - .86; 
PRWll - .90; 

WO/BETA; 

Wl/BETA; 

W2/BETA; 

W3/BETA; 

W4/BETA; 

W5/BETA; 

W6/BETA; 

W7/BETA; 

W8/BETA; 

W9/BETA; 

- Wl 0/BETA; 



38 


FW11 - 1; IF Wll < BETA. THEN FWll - Wl 1/BETA; 
FW12 - 1; IF W12 < BETA THEN FW12 - Wl 2/BETA; 
FW13 - 1; IF W13 < BETA THEN FW13 - Wl 3/BETA; 


MU R 


( FW1 - FWO ) * ERWl + ( FW2 - FWl ) * ERW2 + 

( FW3 - FW2 ) * ERW3 + ( FW4 - FW3 ) * ERW4 + 

( FW5 - FW4 ) * EEW5 + ( FW6 - FW5 ) * ERW6 + 

( FW7 - FW6 ) * ERW7 + ( FW8 - FW7 ) * ERW8 + 

( FW9 - FW8 ) * EKW9 + ( FWlO - FW9 ) * ERWlO + 

( FWll - FWlO ) * ERWll + ( FW12 - FWll ) * ERW12 + 
( FWl 3 - FWl 2 ) * ERWl 3 ; 


SIGMA_R - SQRT( 

( FWl - FWO ) * ER2W1 + ( FW2 - FWl ) * ER2W2 + 

( FW3 - FW2 ) * ER2W3 + ( FW4 - FW3 ) * ER2W4 + 

( FW5 - FW4 ) * ER2W5 + ( FW6 - FW5 ) * ER2W6 + 

( FW7 - FW6 ) * ER2W7 + ( FW8 - FW7 ) * ER2W8 + 

( FW9 - FW8 ) * ER2W9 + ( FWlO - FW9 ) * ER2W10 + 
( FWll - FWlO ) * ER2W11 + ( FWl 2 - FWll ) * ER2WL2 + 
( FWl 3 - FWl 2 ) * ER2W13 - MU R*MU R) ; 


(FWl - FWO ) * EZWl + ( FW2 - FWl ) * EZW2 + 

( FW3 - FW2 ) * EZW3 + ( FW4 - FW3 ) * EZW4 + 

( FW5 - FW4 ) * EZW5 + ( FW6 - FW5 ) * EZW6 + 

( FW7 - FW6 ) * EZW7 + ( FW8 - FW7 ) * EZW8 + 

( FW9 - FW8 ) * EZW9 + ( FWlO - FW9 ) * EZWlO + 

( FWll - FWlO ) * EZWll + ( FW12 - FWll ) * EZW12 + 
( FWl 3 - FWl 2 ) * EZWl 3 ; 


SIGMA_Z - SQRT( 

(FWl - FWO ) * EZ2W1 + ( FW2 - FWl ) * EZ2W2 + 

( FW3 - FW2 ) * EZ2W3 + ( FW4 - FW3 ) * EZ2W4 + 

( FW5 - FW4 ) * EZ2W5 + ( FW6 - FW5 ) * EZ2W6 + 

( FW7 - FW6 ) * EZ2W7 + ( FW8 - FW7 ) * EZ2W8 + 

( FW9 - FW8 ) * EZ2W9 + ( FWlO - FW9 ) * EZ2W10 + 

( FWll - FWlO ) * EZ2W11 + ( FW12 - FWll ) * EZ2W12 + 
( FWl 3 - FW12 ) * EZ2W13 - MU Z*MU Z); 




P R 


(FWl - FWO ) * PRWl + 
( FW3 - FW2 ) * PRW3 + 
( FW5 - FW4 ) * PRW5 + 
( FW7 - FW6 ) * PRW7 + 
( FW9 - FW8 ) * PRW9 + 
( FWll - FWLO ) * PRWll 
( FWl 3 - FWl 2 ) * PRWl 3 


( FW2 - FWl ) * PRW2 + 

( FW4 - FW3 ) * PRW4 + 

( FW6 - FW5 ) * PRW6 + 

( FW8 - FW7 ) * PRW8 + 

( FWlO - FW9 ) * PRWIO + 

( FWL2 - FWll ) * PRWl 2 + 
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MURP - MU R; 

SIGMAJRP - SIGMAR; 

(* convert to hours *) 

MSPBRJiOUR - 1E3*60*60; 

MU R - MUJR/MSPERHOUR ; 

SIGMA_R - SIGMA_R /MS PER HOUR; 

MU Z - MU Z/MS PER HOUR; 

SIGMAZ - SIGMA_Z/MS_PER_HOUR; 

MURP- MUJRP/MS PER HOUR; 

SIGMA_RP - SIGMA_RP/MS_PER_HOUR; 

SHOW MU R, SIGMA R, MU_Z , SIGMA Z , MU RP , SIGMA RP ; 

1.2 - 4*GAMMA; 

2.3 - 3*GAMMA + 3*LAMBDA; 

1.4 - 4*LAMBDA; 

4.5 - 3*GAMMA; 

2.5 - 3*LAMBDA; 

4.6 - 3*LAMBDA + 3*GAMMA; 

2.7 - <MU_R, SIGMA R, P_R>; 

2,1 - <MU_Z, SIGMA Z, 1-P_R>; 

4.7 - <MU_RP, SIGMA_RP>; 

7.8 - 3*GAMMA; 

8.9 - 2*GAMMA + 2*LAMBDA; 

7.10 - 3*LAMBDA; 

10.12 - 2*LAMBDA + 2*GAMMA; 

10,11 - 2*GAMMA; 

8.12 - 2*LAMBDA; 

8.13 - <MU_R, SIGMAR, P_R>; 

10.13 - <MU_RP, SIGMA_RP> ; 

8,7 - <MU_Z, SIGMA Z, 1-P_R>; 

13.14 - GAMMA + LAMBDA; 
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